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ABSTRACT 

The pluripotency factor Lin28 is a highly conserved 
protein comprising a unique combination of RNA- 
binding motifs, an N-terminal cold-shock domain 
and a C-terminal region containing two retroviral- 
type CCHC zinc-binding domains. An important 
function of Lin28 is to inhibit the biogenesis of the 
let-7 family of microRNAs through a direct inter- 
action with let-7 precursors. Here, we systematically 
characterize the determinants of the interaction 
between Lin28 and pre-let-7g by investigating the 
effect of protein and RNA mutations on in vitro 
binding. We determine that Lin28 binds with high 
affinity to the extended loop of pre-let-7g and that 
its C-terminal domain contributes predominantly to 
the affinity of this interaction. We uncover remark- 
able similarities between this C-terminal domain 
and the NCp7 protein of HIV-1, not only in terms of 
primary structure but also in their modes of RNA 
binding. This NCp7-like domain of Lin28 recognizes 
a G-rich bulge within pre-let-7g, which is adjacent to 
one of the Dicer cleavage sites. We hypothesize that 
the NCp7-like domain initiates RNA binding and par- 
tially unfolds the RNA. This partial unfolding would 
then enable multiple copies of Lin28 to bind the 
extended loop of pre-let-7g and protect the RNA 
from cleavage by the pre-microRNA processing 
enzyme Dicer. 

INTRODUCTION 

MicroRNAs (miRNAs) are short single-stranded RNAs 
of ~22nt found in virus, plant and animal species 
that act as post-transcriptional regulators of mRNA ex- 
pression [for recent reviews, see (1-4)]. They are generated 



from a longer RNA, the primary transcript (pri-miRNA), 
by a multi-step process. The pri-miRNA is first cleaved by 
the microprocessor complex containing the endonuclease 
Drosha and the double-stranded RNA-binding protein 
DGCR8 to produce a 60-70 nts RNA hairpin known as 
the precursor miRNA (pre-miRNA). After being exported 
to the cytoplasm, the pre-miRNA is further cleaved by the 
endonuclease Dicer to form a ~22-nt dsRNA. The single- 
stranded mature miRNA is then loaded into the RNA- 
induced silencing complex to regulate its target mRNAs. 

miRNAs play important roles in cell differentiation (5-7), 
and, in mammals, several miRNAs have been shown to 
act as oncogenes and tumor suppressors [reviewed in (8-13)]. 
Among those playing a role as tumor suppressors, the 
let-7 family of miRNAs have been extensively chara- 
cterized, and are known inhibitors of oncogenes such as 
RAS, MYC, HMGA2, and cyclin Dl (10). The let-7 
miRNAs are often present in multiple copies in a single 
genome, with the mature let-7 being highly conserved 
across species. In human and mouse, there are 10 
mature let-7 family sequences (let-7a, let-7b, etc.) 
produced from 13 precursors. 

Although levels of let-7 pri-miRNAs are controlled by 
transcription factors, post-transcriptional regulation is 
critical in determining the levels of mature let-7 miRNAs 
(14-18). Recent studies in embryonic cells have high- 
Hghted the importance of Lin28 in post-transcriptional 
regulation of the let-7 family of miRNAs, where it 
acts as a selective inhibitor of let-7 miRNAs maturation 
(19-21). The various members of the let-7 family are not 
affected to the same degree by Lin28, with let-7a, let-7d 
and let-7g being among the most affected. Several mech- 
anisms have been proposed to explain the Lin28 inhibition 
of let-7 biogenesis. Lin28 was shown to interfere with the 
Drosha cleavage of pri-let-7 (16,19,21) and with the 
cleavage of pre-let-7 by Dicer (22,23). In addition, Lin28 
was shown to induce the uridylylation of pre-let-7 by the 
recruitment of TUT4 (Zcchcll), which leads to its 
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degradation (22,24-26). Although the relative importance 
of these mechanisms in vivo has not been clearly established 
(27), they all involve the formation of a complex between 
Lin28 and the immature forms of the let-7 miRNA. 

Lin28 is a highly conserved protein of 209 amino acids 
known to be an important pluripotency factor (28), and its 
role in pluripotency is Hkely related to its function in let-7 
biogenesis (19,29). Lin28 contains a unique set of 
RNA-binding motifs (30,31); an N-terminal cold shock 
domain (CSD) and a C-terminal region composed of 
two CCHC-type zinc-binding domains [ZBDs; (30)]. 
CSDs are found in several RNA- and DNA-binding 
proteins (32), whereas the CCHC-type ZBDs are most 
commonly found in retroviral nucleocapsid proteins, 
such as the NCp7 protein from HIV-1 (33). Although 
Lin28 has been shown to regulate the stabiHty and trans- 
lation of selected mRNAs (34-37), it plays a central role in 
regulating levels of mature let-7. 

Several in vivo and in vitro studies have sought to char- 
acterize the interaction between pre-let-7 and Lin28 
(19,20,23,24,38). It was demonstrated that both the CSD 
and the ZBDs of Lin28 are necessary for pre-let-7g 
binding in vitro and maturation inhibition in vivo (20). 
As determined by in vitro binding assays, Lin28 binds 
the extended terminal loop of pre-let-7g (20,38). 
Mutation of a conserved cytosine in this loop was 
shown to reduced its in vitro affinity for Lin28 (20). A 
G-rich sequence at the 5^-end of the pre-let-7g terminal 
loop was found to be strongly protected from ribonuclease 
cleavage by Lin28 (38). In addition, mutations of a few 
conserved nucleotides in the terminal loop make the 
immature miRNA resistant to Lin28 inhibition in P19 em- 
bryonal carcinona extract (19). Lin28 also binds the 
extended terminal loop of pre-let-7a-2, and the sequence 
composing the mature miRNA (let-7a) can compete with 
pre-let-7a-2 binding for Lin28 (23). Moreover, a 
four-nucleotide 5^-GGAG-3' sequence important for 
Lin28 binding and its uridylylation by Zcchcl 1 was identi- 
fied at the 3^-end of the terminal loop region of 
pre-let-7a-l (24). Although several studies have con- 
tributed to estabUsh that the RNA-binding domains of 
Lin28 are important for recognition of the extended ter- 
minal loop of pre-let-7, the key determinants of this inter- 
action have not been systematically defined. 

In this work, we used electrophoretic mobihty shift assay 
(EMSA), ribonuclease protection assay, in-Hne probing 
and NMR spectroscopy with purified molecules to map 
the interaction between the pre-let-7g RNA and the Lin28 
protein from mouse. We determine that the C-terminal 
domain of Lin28 contributes predominantly to the high- 
affinity interaction with pre-let-7g and its sequence is very 
similar to the NCp7 protein of HIV-1. We also uncover 
several similarities in terms of RNA binding between 
NCp7 and the C-terminal domain of Lin28. 

MATERIALS AND METHODS 

Plasmids 

Lin28 expression vectors are derived from pGEX4T (GE 
Healthcare) and were constructed using the Lin28 cDNA 



from Mus musculus (Open Biosystems BC068304). Vectors 
for RNA transcription were derived either from the 
pARiBol plasmid or the pRSA-VS plasmid (39). 
Mutant vectors were prepared using the Strategene 
QuikChangell site-directed mutagenesis method or by 
standard cloning of restriction fragments. All plasmids 
created for this study were verified by DNA sequencing. 

RNA preparation for biochemical characterization and 
NMR studies 

Most RNAs used here were transcribed in vitro as ARiBo- 
tagged precursors and purified by batch affinity purifica- 
tion (39,40). In one case (TL-let-7g), the RNA was 
synthesized in vitro as a precursor with a VS ribozyme sub- 
strate at its 3^-end and purified as described previously (41). 
For biochemical characterization (gel-shift, footprinting 
and in-Hne probing), the RNAs were [5^-^^P]-labeled and 
further purified by 20% denaturing gel electrophoresis 
(42). For NMR studies, the RNAs were concentrated 
and exchanged with an Amicon Ultra- 15 3000 NMWL 
(Millipore) in NMR buffer (10 mM dig-HEPES at pH 
6.4, 50 mM NaCl, 0.05 mM NaNs and 10% D2O). 

Protein expression and purification for biochemical 
characterization 

Lin28 and related mutants were expressed in Escherichia 
coli strain BL21 cells (Stratagene). The bacterial cultures 
were grown in LB medium at 37°C and induced with 
1 mM isopropyl-P-D-thiogalactopyranoside (IPTG) for 
4h at 30°C. The cells were harvested by centrifugation 
and resuspended in binding buffer [25 mM Tris, pH 8.0, 
IM NaCl, 0.1% NP-40 alternative (Calbiochem) and 
1 mM DTT] supplemented with Complete EDTA-free 
protease inhibitor (Roche) and 10 U/ml of DNase I re- 
combinant RNase-free (Roche). The cells were lysed by 
French press and centrifuged at 100 000^ for 1 h at 4°C. 
The supernatant was incubated for 1 h at 4°C with GSH- 
Sepharose 4B resin (GE Healthcare). After incubation, the 
resin was washed three times with the binding buffer and 
three times with the S7 buffer (50 mM Tris, pH 8.0, 50 mM 
NaCl, 5mM CaCl2 and 1 mM DTT). The washed resin 
was resuspended in S7 buffer and incubated overnight at 
room temperature with 5 U/ml of Nuclease S7 (Roche). 
The resin was subsequently washed three times with S7 
buffer and incubated 1 h at room temperature with 
lOOU of thrombin (Calbiochem). The eluted protein was 
dialyzed 4h at 4°C in 2 1 of 20 mM Tris, pH 8.0, 2 M 
NaCl, ImM DTT and overnight at 4°C in 20 mM Tris, 
pH 8.0, IM urea, 200 mM NaCl and ImM DTT. The 
dialyzed protein was loaded on an SP-Sepharose 
high-performance column (GE Healthcare) equilibrated 
with FPLC-A (20 mM Tris, pH 8.0 and ImM DTT). 
The protein was eluted from the column using a 
gradient (from 0% to 100% over 525 mL) of FPLC-B 
(20 mM Tris, pH 8.0, 2 M NaCl and ImM DTT). The 
fractions containing the protein were combined, 
concentrated with an Amicon Ultra- 15 3000 NMWL 
(Millipore) and dialyzed in storage buffer (lOOmM Tris, 
pH 7.6, 100 mM NaCl, 20% glycerol and 2mM DTT). 
The NCp7 protein was expressed and purified as described 
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previously (43). All proteins purified for this study were 
verified by mass spectrometry. 

Protein expression and purification for NMR studies 

For NMR studies, uniform ^^N- and ^^N/^^C labeling was 
obtained by growing the cells in minimal media containing 
^^N-labeled NH4CI and ^^C6-glucose as the sole sources 
of nitrogen and carbon, respectively. Protein purification 
was conducted as described above, but with the following 
modifications. The selected fractions from the SP- 
Sepharose column were dialyzed in 5% acetic acid, con- 
centrated on a rotary evaporator and purified on a Vydac 
C4 reverse-phase HPLC column using an acetonitrile 
gradient (from 15% to 35% over 335ml) in 0.05% TFA. 
After HPLC purification, the proteins were refolded in the 
presence of zinc, as described previously (44). 

Electrophoretic mobility shift assay 

For EMSA, the ^^-labeled RNA was first heated and 
snap cooled (heated 2min at 95°C and snap-cooled on 
ice for 5 min) to promote hairpin formation. The protein 
samples were diluted in EMSA buffer (50 mM Tris, pH 
7.6, 50 mM NaCl, 10% glycerol, 0.05% NP-40 alternative 
and 2 mM DTT) and their concentrations were adjusted to 
span from 0.02 x to 50 x of the estimated K^. The binding 
reactions (20 ml) were initiated by mixing 1 pM 
of ^^P-labeled RNA with the diluted proteins and 
incubated at 4°C for 30 min. For each determination, 
~14 binding reactions were loaded directly on an 8% 
native polyacrylamide gel (37.5:1 polyacrylamide/ 
bisacrylamide) and run in Tris-Glycine buffer (25 mM 
Tris-Base and 200 mM glycine) at 200 V for 2h with 
active water cooling in the cold room. The gels were 
fixed in 50% methanol and 10% acetic acid for 1 h, 
washed 15 min in 30% ethanol, quickly rinsed with H2O 
and exposed overnight to a storage phosphor screen 
(Bio-Rad). The ^^-labeled RNA was visualized with a 
Bio-Rad Molecular Imager FX densitometer, and band 
intensities were quantified using the QuantityOne 
software (version 4.6.6 from Bio-Rad). The fraction of 
bound RNA was plotted against protein concentration, 
and the data were fitted to the one-site binding equation 
or to the Hill equation (only for the two cases in Table 1) 
by nonlinear regression analysis within the Origin 7 SR4 
version 7.0552 software (OriginLab, MA, USA). For each 
protein-RNA complex, at least three independent de- 
termination experiments were performed. The reported 
^d's and their errors are, respectively, the average values 
and the standard deviations from these multiple 
experiments. 

In-line probing assay 

In-Hne probing assays were performed as described previ- 
ously (45). The ^^P-labeled RNA was visualized with a 
Bio-Rad Molecular Imager FX densitometer, and band 
intensities were quantified using the Image Lab software 
(version 3.0 from Bio-Rad). 



RNA footprinting assay 

For RNase footprinting, the ^^P-labeled RNA was first 
heated and snap cooled to promote hairpin formation. 
The Lin28ii9_i8o protein was diluted at various concentra- 
tions in EMSA buffer. The protein was first incubated 
with InM of ^^-labeled RNA (10 ml total volume) for 
30 min at 4°C. Then, lU of Ti ribonuclease from 
Aspergillus oryzae (Sigma) was added and the incubation 
continued for 15 min at 4°C. The reaction was stopped 
by the addition of Precipitation/Inactivation buffer 
(Ambion), incubation for 15 min at — 20°C and centrifu- 
gation at 16 000g for 15 min. The RNA pellet was 
dissolved in Gel loading buffer II (Ambion), loaded on a 
20% polyacrylamide/7 M urea sequencing gel and run at 
1900 V for 5h. The sequencing gel was exposed 2h to a 
storage phosphor screen (Bio-Rad). The ^^-labeled RNA 
bands were visualized and quantified as for the EMSA 
assay. 

NMR Spectroscopy 

For NMR studies, the following samples were prepared 
in NMR buffer: 1.0 mM ^^N-labeled Lin28i36-i8o; 
1.3mM ^^N-labeled Lin28ii9-i8o; 1.1 mM ^^C/^^N-labeled 
Lin28ii9_i8o; 0.1 mM TL-let-7g:^^N-labeled Lin28ii9_i8o; 
0.1 mM Abulge TL-let-7g:^^N-labeled Lin28ii9_i8o; 0.1 mM 
G34AG35A TL-let-7g: %-labeled Lin28ii9_i8o; l.OmM 
TL-let-7g:^^N-labeled Lin28ii9_i8o- and 1.3mM TL-let- 
7g:^^C/^%labeled Lin28ii9-i8o- For the TL-let-7g: 
Lin28ii9_i8o complexes, the samples were prepared by ti- 
tration of Lin28ii9_i8o into a TL-let-7g sample. All NMR 
experiments were collected on Varian ^^^^^INOVA 500 
and 600 MHz spectrometers equipped with a pulse-field 
gradient unit and an actively shielded z gradient probe 
(either a room- temperature probe or a cryogenic probe). 
The backbone resonances (^H, ^^N and ^^C) of Lin28ii9_ 
180 in the free and TL-let-7g-bound form were assigned 
using the following NMR experiments collected at 35°C: 
two-dimensional (2D) ^H-^^N HSQC (46); three- 
dimensional (3D) HNCACB (47-49); and 3D 
(HB)CBCA(CO)NNH (48,49). ^H, ^^C and ^^N chemical 
shifts were referenced to an external standard of 2,2- 
dimethyl-2-silapentane-5-sulfonic acid (DSS) at 0.00 ppm 
(50). NMR data were processed with NMRPipe/ 
NMRDraw (51) and analyzed with NMRView (52). 

RESULTS 

Lin28 recognizes the terminal loop of pre-let-7g with its 
C-terminal domain providing the most important energetic 
contribution 

To identify the domain(s) of Lin28 important for binding 
the let-7g precursor miRNA (pre-let-7g), we used EMSAs 
with purified recombinant proteins and in vitro transcribed 
RNAs (Figure 1). It was previously established that Lin28 
recognizes pre-let-7g and its terminal loop with a similar 
affinity [K^ of 1-2 |iM; (20,38)]. Thus, we initiated our 
study by determining the of full-length Lin28 
(Lin28i_209) for the terminal loop of pre-let-7g (TL-let-7g; 
Figure ID). We attempted to fit the data (Figure 2 A) 
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Figure 1. The Lin28 protein, pre-let-7g RNA and related sequences used in this study. (A) Schematic representation of the primary structures of 
Lin28 and deletion fragments. The gray boxes dehneate sequences of known RNA-binding motifs: a cold shock domain (CSD) and a pair of 
retroviral-type CCHC zinc-binding domains (ZBDl and ZBD2). (B) Schematic representation of pre-let-7g, indicating the regions (gray boxes) from 
which TL-let-7g and duplex let-7g GNRA were derived. (C and D) Primary and secondary structures of the (C) duplex let-7g GNRA and (D) TL-let- 
7g. Nucleotides within the mature miRNA sequence are in blue and non-natural nucleotides are shown in lowercase. In (D), site-specific mutations of 
TL-let-7g are in red and regions that were replaced by alternative structured elements are boxed. 



using the classical one-site binding equation, but the fit 
was rather poor (Figure 2C, red Hne). Using the Hill 
equation, a much better fit was obtained (Figure 2C, 
blue line), from which we derived a of 0.13 ± 
0.02 nM with a Hill coefficient of 2.9 ± 0.7 (Table 1). 
These results clearly demonstrate that Lin28 binds the 
terminal loop of pre-let-7g with much higher affinity 
than previously reported (20,38). 

Given this unexpected result, we verified the binding of 
Lin28 to the full-length pre-let-7g. We obtained a of 
0.15 ± 0.04 nM with a Hill coefficient of 2.7 ± 0.5 for this 
interaction (Table 1), confirming that pre-let-7g and its 
terminal loop have similar affinities for Lin28, as previ- 
ously estabhshed (20,38). As a control, we measured 
the binding for Lin28 to an RNA that contains only 
the miRNA stem of pre-let-7g (duplex let-7g GNRA; 
Figure IC) and obtained a 60-fold lower {K^ = 9 ± 
3nM), further supporting that the terminal loop is the 
main determinant of Lin28 binding to pre-let-7g. 

Since Lin28 contains two different RNA-binding 
domains, we determined the affinity of Lin28 fragments 
containing either the CSD (Lin2839_ii2) or the two ZBDs 
(Lin28ii9_i8o) for pre-let-7g, TL-let-7g and duplex let-7g 
GNRA (Figure 1). For the N-terminal Lin2839_ii2 
fragment comprising the CSD (Figure lA), values of 
41 ± 10 nM and 126 ± 41 nM were obtained for pre-let-7g 
and TL-let-7g, respectively, indicating significantly weaker 
affinity (<250-fold) compared to the full protein. For the 
duplex let-7g GNRA, only a minimum value could be 
obtained (>250nM), because of aggregation of the 
Lin2839_ii2 domain detected at concentrations higher 
than 250 nM. In contrast, the Lin28ii9_i8o fragment con- 
taining the ZBDs displays only slightly lower affinities 
than full-length Lin28 toward pre-let-7g and TL-let-7g, 



with values of 0.6 ± 0.1 nM and 1.3 ± 0.3 nM, respect- 
ively (Table 1, Figure 2B and C). These binding data for 
Lin28ii9_i8o can be fitted to a classical one-site binding 
equation (Figure 2C and Table 1). For the duplex-let-7g 
GNRA, no specific binding could be observed with 
Lin28ii9_i8o- Thus, compared to the full-length protein, 
the C-terminal Lin28ii9_i8o fragment containing the two 
ZBDs displays similar specificity toward RNAs derived 
from pre-let-7g. Furthermore, the affinity between 
Lin28ii9_i8o and TL-let-7g is only 10-fold weaker than 
between the full-length Lin28 protein and pre-let-7g, 
indicating that Lin28ii9_i8o and TL-let-7g encompass the 
main determinants of the Lin28/pre-let-7g interaction. 

Similarities between the C-terminal domain of Lin28 and 
the HIV-1 NCp7 protein 

It has been previously noted that the C-terminal domain 
of Lin28 contains two ZBDs similar to those found in viral 
nucleocapsid proteins (30). Given the importance of this 
RNA-binding domain for pre-let-7g binding, we searched 
for proteins containing a similar domain in the Swiss-Prot 
database using BLAST (53). The nucleocapsid proteins 
from simian and human immunodeficiency viruses give 
the highest scores after Lin28 proteins from different 
species. The sequence alignment between the well- 
characterized HIV-1 nucleocapsid NCp7 and metazoan 
Lin28 sequences indicates significant similarities in the 
ZBDs, and also, surprisingly, in the N-terminal KR-rich 
domain (Figure 3). The percentage of sequence similarity 
to HIV-1 NCp7 is relatively high for both murine (45%) 
and human Lin28i23-i8o (45%). All three proteins contain 
identical zinc-chelating amino acids (CCHC) and spacing 
between these residues in the two ZBDs. In addition, the 
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Figure 2. EMSA of TL-let-7g with Lin28i_209 and Lin28ii9_i8o- 
(A) Typical EMSA performed with 1 pM of 5^-[^¥]-labeled TL-let-7g 
and increasing concentrations of Lin28i_209 (0-0? 0.002, 0.010, 0.025, 
0.050, 0.075, 0.10, 0.15, 0.20, 0.50, 1.0, 2.5 and 5.0 nM). (B) Typical 
EMSA for TL-let-7g and increasing concentrations of Lin28ii9_i8o (0-0, 
0.02, 0.10, 0.25, 0.50, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0, 5.0,10, 25 and 
50 nM). (C) The bound fraction of RNA is plotted against the total 
concentration of protein. The data for binding of TL-let-7g to 
Lin28i_209 (squares) are fitted to both the one site binding equation 
(red line; = 0.2 nM) and the HiU equation (blue line; = 
0.13 nM). The data for binding of TL-let-7g to Lin28ii9_i8o (dots) 
are fitted to the one site binding equation (green line; K^^ = 1.3 nM). 



Table 1. Dissociation constants (K^ innM) of different domains of 
Lin28 for various pre-let-7g constructs 



RNA 


Lin28i_209 


Lin2839_ii2 


Lin28ii9_i8o 


Pre-let-7g 


0.15 ± 0.04 


41 ± 10 


0.6 ± 0.1 




n = 2.1 ± 0.5^ 






Duplex let-7g GNRA 


9 ± 3 


>200 


n.b.^ 


TL-let-7g 


0.13 ± 0.02 


126 ± 41 


1.3 ± 0.3 




n = 2.9 ± O.T 







^In these cases, the Hill equation was used to derived the values. 
^No specific binding observed. The gel mobihty shift assays display 
smearing and multiple shifts, indicating non-specific binding. 



spacing between the two ZBDs is similar with seven resi- 
dues in NCp7 of HIV-1 and eight residues in the Lin28 
sequences (Figure 3). Furthermore, this sequence 



similarity also involves several of the NCp7 residues 
from the KR-rich domain and the two ZBDs that contrib- 
ute to the RNA-binding interface as observed in the NMR 
structures of RNA/NCp7 complexes (33,54). 

To further investigate the binding of Lin28ii9_i8o to 
TL-let-7g, we performed NMR chemical shift perturb- 
ation experiments. In both the free and bound forms, 
Lin28ii9_i8o displays a well-dispersed ^H-^^N HSQC 
spectrum (Figure 4A), indicating that both adopt a 
stable homogenous conformation in solution. Analysis 
of the and ^^N chemical shifts reveals that 21 of the 
56 amino acid residues analyzed display significant chem- 
ical shift differences between the free and RNA-bound 
form (Figure 4B; A5 > 0.4 ppm). When mapped onto the 
primary structure of Lin28ii9_i8o, the residues showing 
significant chemical shift differences are found in the 
KR-rich domain, both ZBDs, as well as in the linker be- 
tween ZBDl and ZBD2. These results indicate that all 
these domains participate in RNA binding either by 
direct contact or through conformational rearrangement 
of the protein. 

To determine if the Lin28ii9_i8o fragment could be 
shortened while maintaining its affinity for the terminal 
loop of pre-let-7g, we generated several N-terminal and 
C-terminal deletions (Figure lA). Of the three deletion 
fragments that were expressed and purified (Lin28i36-i8o, 
Lin28n9_i4o and Lin28n9_i6o), none binds TL-let-7g with 
high affinity (7^^ > 5 |iM; Table 2), and thus Lin28ii9_i8o 
constitutes the minimal domain required for TL-let-7g 
binding. It is particularly striking that removal of the 
first 17 amino acids encompassing the KR-rich domain 
is as detrimental to binding as removal of one or two 
ZBDs. To insure that the absence of binding with 
Lin28i36_i8o is not due to protein misfolding, we compared 
the ^H-^^N HSQC spectrum of this fragment with that of 
Lin28ii9_i8o (Supplementary Figure SI). The chemical 
shift similarity between these two spectra indicates that 
the ZBDs adopt a similar fold in Lin28i36_i8o and 
Lin28ii9_i8o. In addition, the importance of the KR-rich 
domain was further investigated using a mutant of 
Lin28ii9_i8o in which all lysines and arginines of the 
KR-rich domain are mutated (KR" with mutations 
R122A, R123A, K125A, K127A, K131A, R132A, 
R133A and K135G). As expected, Lin28ii9_i8o (KR~) 
does not bind TL-let-7g with high affinity {K^ > 5 |iM; 
Table 2). Interestingly, we found that the NCp7 protein of 
HIV-1 binds with the same affinity to TL-let-7g 
(1.1 ± 0.3 nM) as Lin28ii9_i8o (Table 2). Thus, in addition 
to sharing sequence similarity with NCp7, Lin28ii9_i8o 
also uses both its KR-rich and ZBDs for RNA recog- 
nition and binds with the same affinity as NCp7 to 
TL-let-7g. To emphasize these similarities with 
NCp7, we defined Lin28ii9_i8o as the NCp7-like domain 
of Lin28. 

Global mapping of the interaction site using ribonuclease 
protection assay 

A ribonuclease protection assay was used to identify the 
region(s) of TL-let-7g interacting with the NCp7-Hke 
domain. As a first step, in-Hne probing (Supplementary 
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Figure 3. Sequence similarity between the HIV-1 NCp7 and the C-terminal domain of Lin28. The sequences of HIV-1 NCp7 and Lin28 from Mus 
musculus (mmu), Homo sapiens (hsa), Gallus gallus (gga), Xenopus laevis (xla) and Danio rerio (dre) were ahgned using ClustalW2 (70). A consensus 
sequence is given with the standard one-letter code in capital letters for amino acids, as well as the following notation: a, aromatic; h, hydrophobic; 
p, polar; +, positively charged. The schematic representation of NCp7 highhghts the domains that contribute to RNA binding: an N-terminal 
KR-rich domain and two zinc-binding domains (ZBDl and ZBD2). The residues of NCp7 in red and blue make direct contact with zinc and RNA, 
respectively (33,54). Those residues that could play an equivalent role in Lin28 are similarly colored. 
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Figure S2) was performed to establish the secondary struc- 
ture of TL-let-7g. These resuUs confirm that the free 
TL-let-7g adopts a hairpin conformation, with dynamic 
residues in the G-rich bulge (residues 7-11), the adjoining 



Table 2. Dissociation constants {K^ innM) of mutants of the 
NCp7-like domain of Lin28 (Lin28ii9_i8o) for the TL-let-7g RNA 



Protein (nM) 



Lin28n9_i8o 1-3 ± 0.3 

Lin28i36-i8o >5000 

Lin28n9_i4o >2500 

Lin28ii9-i6o >5000 

Lin28ii9-i8o (KR") >5000 

HIV-1 NCp7 1.1 ± 0.3 



internal loop (residues 16-19 and 32-35) and the hairpin 
loop (residues 24-27; Figure 5A). For the ribonuclease 
protection assay, RNase Ti was selected because it 
shares the specificity of ZBDs for single-stranded guanines 
(55). The results of nuclease mapping in the presence of 
increasing concentration of Lin28ii9_i8o (Figure 5) clearly 
demonstrate that only the G residues within the G-rich 
bulge (G8, GIO and Gil) and adjacent stems (G4 and 
G12) are protected by Lin28ii9_i8o. Only one G residue 
(G18) becomes more accessible in the presence of 
Lin28ii9_i8o, most likely resulting from destabilization of 
the predicted G18-C12 base pair (see Discussion section). 
Thus, the ribonuclease protection assay indicates that the 
G-rich bulge is the main binding site for the NCp7-Hke 
domain and that the adjoining internal loop is also 
affected by binding. 

Detailed mapping of the pre-let-7g determinants for 
Lin28ii9_i8o binding 

Next, we performed an exhaustive EMS A analysis of 
TL-let-7g mutants to identify the RNA determinants of 
the high-affinity interaction between TL-let-7g and 
Lin28ii9_i8o (i^d = 1-3 nM; Table 3). Since NCp7 and 
several other ZBDs specifically recognize single-stranded 
nucleic acids (33,54), several mutants of the loop regions 
of TL-let-7g (Figure ID) were investigated. Replacement 
of the ACCC hairpin loop by a stable GNRA tetraloop 
(GCAA) increases the by a factor of 3, and the 
punctual C25A mutation increases the by a factor of 
1.6. Deletion of unpaired nucleotides in the internal loop 
to create a stable stem (Figure ID; Ailoop) causes a 
5. 3 -fold increase in the compared with the wild- type 
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Figure 5. Footprint analysis of TL-let-7g with RNase Ti. (A) 
Secondary structure of TL-let-7g with the mapping of in-Hne probing 
and Ti footprinting data. Residues that are the most susceptible to 
spontaneous cleavage through in-hne attack are in bold 
(Supplementary Figure S2), and residues that experience a significant 
reduction (/p//o = -4) or enhancement (/p//o = +4) of Ti cleavage in 
the presence of Lin28ii9_i8o are shaded in red and blue, respectively. (B) 
Typical RNA footprinting gel of TL-let-7g in the absence and presence 
of Lin28ii9_i8o (at concentrations of 0, 1, 5, 25, 100, 250, 500 and 
1000 nM). Lanes with input TL-let-7g (RNA), an alkaUne hydroly- 
sis ladder (OH—) and a Ti hydrolysis ladder (TI) are also included. 
(C) Histogram of normalized band sensitivity (/p//o, where /p and Iq 
are, respectively, the intensity in the presence and absence of protein) 
for Ti cleaveage of each guanine obtained at 25— 500n]VI Lin28ii9_i8o- 



RNA. Similar changes in affinity are also observed with 
simultaneous mutations of three unpaired nucleotides 
from the internal loop (A19C/G34A/G35A: 6.2-fold 
increase) or from the combined effect of two related mu- 
tations (1.2- and 3.8-fold increase for the A19C and 
G34A/G35A mutants, respectively). Thus, the internal loop 
of TL-let-7g makes a minor contribution to Lin28ii9_i8o 
binding, but the hairpin loop appears to be replaceable. 

Several EMSA results indicate that the G-rich bulge 
contributes significantly to Lin28ii9_i8o binding. First, 
deletion of the G-rich bulge (Figure ID; Abulge) has a 
substantial effect since no specific binding could be ob- 
served with this mutant at a protein concentration as 



Table 3. Dissociation constants {K^ innM) of the NCp7-Hke domain 
of Lin28 (Lin28ii9_i8o) for various mutants of the TL-let-7g RNA 



TL-let-7g RNA 


(nM) 


^d/[^d (wt)]^ 


Wild type (wt) 


1.3 ± U.3 


1 


GNRA tetraloop 


4 ± 1 


3.1 


C25A 


2.1 ± 0.5 


1.6 


Ailoop 


6.9 ± 0.7 


5.3 


A19C 


1.6 ± 0.6 


1.2 


G34A G35A 


5 ± 1 


3.8 


A19C G34A G35A 


8 ± 2 


6.2 


Abulge 


n.b.^ 


n.b.^ 


G8C GIOC Gil A G12A 


n.b.^ 


n.b.^ 


G8C 


1.1 ± 0 1 


0.9 


GIOC 


9 ± 2 


6.9 


GllA 


3 ± 1 


2.3 


G12A 


25 ± 7 


19 


GIOC G12A 


18 ± 5 


14 


^The Ard/[Xd (wt)] is the ratio of the K^^ obtained for the mutant TL-let- 
7g over the obtained for the wild-type TL-let-7g RNA. 
^No specific binding observed. The gel mobihty shift assays display 
smearing and multiple shifts, indicating non-specific binding. 



high as 5 |iM. Similarly, no specific binding could be 
observed for a mutant in which all guanines at the bulge 
were mutated (G8C/G10C/G1 1A/G12A). To identify 
the guanine residues of the G-rich bulge that are import- 
ant for Lin28ii9_i8o binding, each guanine was indi- 
vidually mutated and the was determined by EMSA 
(Table 3). Although the G8C and Gl 1 A mutations have a 
negligible effect on binding, the GIOC or G12A mutations 
cause 7- and 19-fold increases in K^, respectively, which 
represent the largest changes observed in this study for 
single nucleotide mutations. Surprisingly, the double 
G10C/G12A mutation does not completely abohsh specific 
binding, but instead has a similar effect on binding as the 
single G12A mutation, suggesting that the remaining 
guanine residues (G8 and Gil) may contribute to binding 
in the double mutant (G10C/G12A). Thus, it appears that 
GIO and G12 are key residues for the recognition, but that 
other G residues may also contribute to the affinity of the 
G-rich bulge for Lin28ii9_i8o. The G-rich bulge clearly 
represents the main RNA determinant of the high-affinity 
interaction with Lin28ii9_i8o, although the internal loop of 
TL-let-7g makes a minor contribution to binding. 

To provide additional evidence for the importance of 
the G-rich bulge, we performed NMR studies of Lin28ii9_ 
180 in complex with TL-let-7g mutants (Supplementary 
Figure S3). As expected, the ^H-^^N HSQC spectrum of 
the G34AG35A/Lin28ii9_i8o complex is almost identical 
to that of the TL-let-7g/Lin28ii9_i8o complex, particularly 
for residues from the two ZBDs. In contrast, the ^H-^^N 
HSQC spectrum of the Abulge/Lin28ii9_i8o complex indi- 
cates that, at the high concentration (0.1 mM) used for 
these NMR studies, Lin28ii9_i8o interacts with the 
Abulge mutant, but in a different manner than observed 
for the wild-type TL-let-7g. Thus, these NMR results 
confirm that the G-rich bulge is the main determinant 
for high-affinity binding of Lin28ii9_i8o to TL-let-7g. 
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Given the importance of the G-rich bulge for Lin28ii9_ 
180 binding, we examined the sequences of all known 
pre-let-7g RNAs (Supplementary Figure S4). We find 
that the sequences of the G-rich bulge and adjacent 
residues are highly conserved in mammals, birds and am- 
phibians and fit the consensus sequence U*AGGGU (* is 
A, C or G), with the UGAGGGU sequence found in 
mouse and human being the most common. 



DISCUSSION 

In this work, we systematically identified the key deter- 
minants of the interaction between pre-let-7g and Lin28 
using highly-purified proteins and RNAs. A surprising 
result is the high affinity {K^ of 0.15nM) measured for 
Lin28 binding to pre-let-7g, given that a of 1-2 |iM 
has previously been reported for this interaction (20,38). 
One important factor that may explain the higher affinity 
measured in this study is the absence of RNA competitor 
in the binding buffer. We also took great care of removing 
RNA contaminants during the purification of the Lin28 
proteins and found it necessary to use the S7 nuclease. 
Nevertheless, our results are in general agreement with 
previous studies, which estabhshed that the extended 
terminal loop of pre-let-7g is the binding site for Lin28 
(20,38). 

The EMSA data for the binding of full-length Lin28 to 
pre-let-7g and TL-let-7g could be fitted well by using the 
Hill equation, but not the classical one-site binding 
equation (Figure 2C). The Hill coefficients of ~2.8 
indicate that these interactions involve a minimum of 
three binding sites for Lin28 on the target RNA, and 
most Hkely reflect positive cooperativity for these 
binding events (56). The concept that Lin28 can bind 
multiple sites on the RNA is further supported by 
supershifts observed at higher protein concentration 
(>5nM; Figure 2A). Since the Hill coefficients obtained 
for TL-let-7g and pre-let-7g are essentially identical, we 
propose that full-length Lin28 can cooperatively bind a 
minimum of three sites in the extended terminal loop of 
pre-let-7g. This abihty of Lin28 to cooperatively bind 
multiple sites is not observed with the isolated 
NCp7-like domain and could not be identified from our 
binding data with the isolated CSD due to severe aggre- 
gation problems, as previously reported (31). The CSD of 
Lin28 is similar to bacterial cold shock proteins (30), and 
other members of this family of proteins are reported to 
display cooperativity and weak specificity (57). For 
example, the RNA chaperone CspA is known to destabil- 
ize RNA secondary structure by cooperatively binding to 
single-stranded regions with low sequence specificity (58). 
Thus, although the CSD of Lin28 does not contribute 
significantly to the affinity of Lin28 to pre-let-7g, it may 
mediate cooperative binding in the context of the 
full-length protein. 

The C-terminal domain of Lin28 displays remarkable 
similarities with the NCp7 protein of HIV-1. The 
sequence alignment between the NCp7 protein and 
metazoan Lin28 proteins shows a high degree of similar- 
ity, which involves several residues from the KR-rich 



domain and the two ZBDs of NCp7. Several of these 
residues have been shown to contribute to RNA binding 
in NMR structures of NCp7 bound to RNA hairpins 
derived from the HIV-1 ^ site (33,54). Here, both NMR 
and mutational studies confirmed that the KR-rich 
domain and both ZBDs of Lin28 participate in TL-let- 
7g binding. In particular, truncation of the KR-rich 
domain or mutations of K/R residues within the 
KR-rich domain of Lin28ii9_i8o abrogate high-affinity 
binding to TL-let-7g. Similarly, truncation of the 
KR-rich domain from HIV-1 NCp7 was previously shown 
to prevent the specific binding of NCp7 to its ^-site RNA 
target (59). 

The similarity between Lin28ii9_i8o and the NCp7 
protein also extends to their binding affinity and specifi- 
city. Indeed, both Lin28n9_i8o and NCp7 display low 
nanomolar affinities toward their specific RNA targets 
(60,61) and specifically recognize a G-rich single-stranded 
region. Furthermore, we find that Lin28n9_i8o and NCp7 
bind TL-let-7g with similar affinities under the same con- 
ditions. Lin28ii9_i8o preferentially binds the G10-X-G12 
unit of the G-rich bulge, but may also bind G8-X-X-G11 
when GIO and G12 are simultaneously mutated. Our 
results are compatible with a recent study in which 
Lin28 was found to strongly protect residues G8, GIO, 
Gil and G12 of the G-rich bulge of pre-let-7g from ribo- 
nuclease cleavage (38). The sequence at the G-rich bulge of 
pre-let-7g is highly conserved in mammals (Supplementary 
Figure S4), and sequence similarity was also found in this 
region for most human and mouse let-7 family members 
(19,20,24). Thus, it is likely that this region is important 
for Lin28 binding to other pre-let-7 miRNAs. NCp7 from 
HIV-1 was also reported to bind exposed guanines with its 
two ZBDs (33,54,60,62,63). Its target sequences in the 
^-site RNA are generally located in a hairpin loop, 
where they form a G-X-G motif, but exceptions such as 
the binding to the 1x3 internal loop of SLl demonstrate 
flexibility in RNA recognition by NCp7 (60,63). This flexi- 
bility in target sequence recognition may be inherent to the 
adaptive nature of the NCp7 motif and may explain, in 
part, why Lin28 binds a wide variety of mRNA targets in 
addition to let-7 precursors (35,36,64,65). Lin28ii9_i8o 
possibly recognizes its RNA target in the same way that 
HIV-1 NCp7 binds hairpin loops in the HIV ^ RNA 
(33,54). In this model, the two ZBDs of Lin28 would 
each bind an exposed guanine in the G-rich bulge and 
the KR-rich domain would bind an adjacent stem or 
enlarged major groove. Interaction of Lin28ii9_i8o with 
a domain adjacent to the G-rich bulge is consistent with 
our mutational studies, which indicate that the internal 
loop of TL-let-7g makes a small contribution to binding. 
Given all these similarities between NCp7 and Lin28ii9_ 
180, it is tempting to speculate on a role for HIV-1 NCp7 in 
let-7 biogenesis and a common origin for NCp7 and the 
C-terminal domain of Lin28. The latter is inconsistent 
with a classical evolutionary model that does not include 
viruses, but agrees with an alternative model in which 
viruses play an important role in cellular evolution (66,67). 

The high-affinity of Lin28 toward TL-let-7g is mostly 
due to its C-terminal NCp7-like domain, which has been 
shown to be required for several functional aspects of 
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Lin28. It was shown to be required for Lin28 processing 
inhibition of pre-let-7g in vivo (20), its locaHzation to 
P-bodies (31), its specific binding to let-7 precursors 
(20,24) and for the Lin28 -mediated uridylylation of 
pre-let-7-a-l by TUT4 (22). However, the NCp7-Hke 
domain is not sufficient for Lin28 function, as previously 
demonstrated with two Lin28 homologs, Lin28B and 
Lin28B-S, which are overexpressed in human 
hepatocellular carcinoma and in several cancer lines (68). 
The Lin28B-S preserves the NCp7-Hke domain but 
contains a truncation of the cold-shock domain. It has 
been shown that the overexpression of Lin28B-S does 
not induce cancer cell proHferation in contrast to what is 
observed with Lin28B (68). In addition, Lin28B-S does 
not inhibit the processing of pri-let-7g Hke Lin28 and 
Lin28B (21). Thus, although the NCp7-like domain of 
Lin28 Hkely contributes to its in vivo function through 
high-affinity and specific binding to the terminal loop of 
pre-let-7g, the cold-shock domain is required for Lin28 to 
function as an effective oncogene and inhibitor of let-7 
biogenesis. 

Given its high-affinity for pre-let-7g, the NCp7-like 
domain may be responsible for the initial targeting of 
pre-let-7g. After this initial binding event, partial unfold- 
ing of the terminal loop of pre-let-7g would make it more 
accessible for binding multiple copies of Lin28. Both the 
NCp7 domain and the CSD have been previously 
described as RNA chaperones (58,69) and could contrib- 
ute to making the terminal loop more accessible. In agree- 
ment with this role of the NCp7-like domain in Lin28, its 
binding to TL-let-7g makes G18 of the internal loop more 
accessible to ribonuclease cleavage. The high-affinity and 
specificity of the NCp7-Hke domain for the G-rich bulge 
may also allow Lin28 to bind its RNA target in an orderly 
fashion to insure that important functional regions of the 
RNA are protected from binding of miRNA processing 
enzymes. The G-rich bulge is directly adjacent to one of 
the Dicer processing site. Thus, it is Hkely that Lin28 
binding at the G-rich bulge protects the pre-let-7g RNA 
from Dicer cleavage at this site, likely by both steric hin- 
drance and destabilization of the stem region near the 
G-rich bulge (38). 

Our mutagenesis, Tl footprinting and NMR data all 
indicate that the G-rich bulge of TL-let-7g is the main de- 
terminant for high-affinity binding to Lin28ii9_i8o. In an 
apparent contradiction with our results, a previous report 
identified a different G-rich region of pre-let-7 (GGAG 
residues 34-37 of the internal loop in Figure 5A) to be 
important for Lin28 binding (24). Both these G-rich 
regions are highly conserved in mammahan let-7g, and 
similar G-rich regions are found both at the 5^-end and 
3^-end of the terminal loop in most members of the human 
and mouse let-7 family (19,20,24). Thus, it is likely that 
both G-rich regions in pre-let-7 miRNAs are important 
for binding full-length Lin28. For example, initial binding 
of Lin28 to the G-rich bulge may expose internal loop 
residues and allow Lin28 binding to the other G-rich 
region. Alternatively, the levels of Lin28 and other cellular 
factors may affect binding of Lin28 to the two G-rich 
regions. Interestingly, both hnRNP Al and KSRP specif- 
ically bind G-rich sequences within the terminal loop of 



pre-let-7al (17,18) and may regulate Lin28 binding at 
these sites in some pre-let-7 members. Clearly, further 
studies are needed to more precisely determine how each 
G-rich region contributes with cellular factors to regulate 
pre-let-7 biogenesis. 

SUPPLEMENTARY DATA 
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Supplementary Figures Sl-4, Supplementary References 
[71-73]. 
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